Exercise: Normalization

The Range of Values

One thing you may have noticed, especially when creating density plots and comparing feature values, is that that the values in each feature column are in quite a wide range; some are very large numbers or small, floating points.

Now, our end goal is to cluster this data and clustering relies on looking at the perceived similarities and differences between features! So, we want our model to be able to look at these columns and consistently measure the relationships between features.

To make sure the feature measurements are consistent and comparable, you’ll scale all of the numerical features into a range between 0 and 1. This is a pretty typical normalization step.

Below, is what the exercise looks like in the main notebook.

EXERCISE: Normalize the data

You need to standardize the scale of the numerical columns in order to consistently compare the values of different features. You can use a MinMaxScaler to transform the numerical values so that they all fall between 0 and 1.

# scale numerical features into a normalized range, 0-1
# store them in this dataframe
counties_scaled = None

Try to complete this task on your own , in the exercise notebook, and if you get stuck or want to consult a solution, I’ll go over my solution, next.